Partial Correlation Screening for Estimating Large Precision Matrices, with Applications to Classification By
نویسندگان
چکیده
Given n samples X1,X2, . . . ,Xn from N(0, ), we are interested in estimating the p×p precision matrix = −1; we assume is sparse in that each row has relatively few nonzeros. We propose Partial Correlation Screening (PCS) as a new row-by-row approach. To estimate the ith row of , 1 ≤ i ≤ p, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of indices using a stage-wise algorithm, where in each stage, the algorithm updates the set of recruited indices by adding the index j that has the largest empirical partial correlation (in magnitude) with i, given the set of indices recruited so far. In the Clean step, PCS reinvestigates all recruited indices, removes false positives and uses the resultant set of indices to reconstruct the ith row. PCS is computationally efficient and modest in memory use: to estimate a row of , it only needs a few rows (determined sequentially) of the empirical covariance matrix. PCS is able to execute an estimation of a large (e.g., p = 10K) in a few minutes. Higher Criticism Thresholding (HCT) is a recent classifier that enjoys optimality, but to exploit its full potential, we need a good estimate of . Note
منابع مشابه
Partial Correlation Screening for Estimating Large Precision Matrices, with Applications to Classification
Given n samples X1, X2, . . . , Xn from N(0,Σ), we are interested in estimating the p× p precision matrix Ω = Σ−1; we assume Ω is sparse in that each row has relatively few nonzeros. We propose Partial Correlation Screening (PCS) as a new row-by-row approach. To estimate the i-th row of Ω, 1 ≤ i ≤ p, PCS uses a Screen step and a Clean step. In the Screen step, PCS recruits a (small) subset of i...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملImproving Chernoff criterion for classification by using the filled function
Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملEstimating Sparse Precision Matrices from Data with Missing Values
We study a simple two step procedure for estimating sparse precision matrices from data with missing values, which is tractable in high-dimensions and does not require imputation of the missing values. We provide rates of convergence for this estimator in the spectral norm, Frobenius norm and element-wise `∞ norm. Simulation studies show that this estimator compares favorably with the EM algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016